Hướng dẫn lập trình CUDA: Cơ sở phát triển kernel CUDA

Phát triển kernel CUDA bắt đầu bằng việc định nghĩa một kernel, là một hàm C++ chuyên biệt được thiết kế để thực thi song song trên số lượng lõi khổng lồ của một GPU NVIDIA. Những hàm này đại diện cho đơn vị công việc cơ bản trong mô hình lập trình CUDA, đóng vai trò như cầu nối nơi logic tuần tự từ máy chủ chuyển sang thực thi song song quy mô lớn trên thiết bị.

1. Bộ chỉ định global

Bộ chỉ định __global__ không thể thiếu trong API, chỉ thị trình biên dịch tạo mã cho GPU nhưng vẫn giữ điểm vào hàm hiển thị với CPU. Những hàm thực thi trên GPU và có thể gọi từ máy chủ được gọi là kernel.

2. Môi trường thực thi

Các kernel được phân bổ đến và thực thi trên Đơn vị xử lý luồng (SMs). SM là động cơ tính toán chính bên trong GPU NVIDIA, chịu trách nhiệm quản lý hàng trăm luồng đồng thời. Mỗi SM xử lý các khối luồng và lên lịch chúng chạy trên các nhân xử lý.

Quy tắc cú pháp: Các kernel phải trả về chính xác void. Vì chúng hoạt động bất đồng bộ với máy chủ, chúng không thể trả về giá trị trực tiếp cho CPU; phải ghi kết quả trở lại bộ nhớ thiết bị đã cấp phát.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary function of the __global__ specifier?

It defines a function that runs on the CPU but is callable from the GPU.

It defines a kernel that runs on the GPU and is callable from the CPU.

It allocates memory on the GPU's SM cache.

It synchronizes all threads in a block.

✅ Correct!

Correct! __global__ is the bridge used to launch kernels from Host code.

❌ Incorrect

Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.

QUESTION 2

Why must CUDA kernels return void?

Because they execute asynchronously and have no direct path to return values to the Host thread.

To save registers on the SM.

Because GPU memory is read-only.

The NVCC compiler does not support float returns.

QUESTION 3

Which hardware component is responsible for managing and executing threads in a CUDA kernel?

The PCIe Controller.

The Streaming Multiprocessor (SM).

The Host RAM controller.

The BIOS.

QUESTION 4

What happens when a Host calls a kernel function?

The CPU halts until the GPU finish processing.

The GPU creates a clone of the function for every available SM.

The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.

The CPU performs a context switch to the GPU.

QUESTION 5

Which of the following is the correct definition of a CUDA kernel?

A function that executes on the GPU and is invoked from the Host.

A C++ library for file I/O.

A hardware driver for NVIDIA GPUs.

A standard CPU function with the __gpu__ prefix.

1. Bộ chỉ định __global__

2. Môi trường thực thi

1. Bộ chỉ định global